Most of the research on clustering ensemble focuses on designing practical consistency learning algorithms. To solve the problems that the quality of base clusters varies and the low-quality base clusters have an impact on the performance of the clustering ensemble, from the perspective of data mining, the intrinsic connections of data were mined based on the base clusters, and a high-order information fusion algorithm was proposed to represent the connections between data from different dimensions, namely Clustering Ensemble with High-order Consensus learning (HCLCE). Firstly, each high-order information was fused into a new structured consistency matrix. Then, the obtained multiple consistency matrices were fused together. Finally, multiple information was fused into a consistent result. Experimental results show that LCLCE algorithm has the clustering accuracy improved by an average of 7.22%, and the Normalized Mutual Information (NMI) improved by an average of 9.19% compared with the suboptimal Locally Weighted Evidence Accumulation (LWEA) algorithm. It can be seen that the proposed algorithm can obtain better clustering results compared with clustering ensemble algorithms and using one information alone.
For the cutting stock problem of circular parts which is widely existed in many manufacturing industries, a new parallel genetic algorithm for cutting stock was proposed to maximize the material utilization within a reasonable computing time, namely Parallel Genetic Blanking Algorithm (PGBA). In PGBA, the material utilization rate of cutting plan was used as the optimization objective function, and the multithread was used to perform the genetic manipulation on multiple subpopulations in parallel. Firstly, a specific individual coding method was designed based on the parallel genetic algorithm, and a heuristic method was used to generate the individuals of population to improve the search ability and efficiency of the algorithm and avoid the premature phenomena. Then, an approximate optimal cutting plan was searched out by adaptive genetic operations with better performance. Finally, the effectiveness of the algorithm was verified by various experiments. The results show that compared with the heuristic algorithm proposed in literature, PGBA takes longer computing time, but has the material utilization rate greatly improved, which can effectively improve the economic benefits of enterprises.
Scientists identify the species of whales based on the shape and the distinctive marks of the whale tails, but the process of recognition by human eyes and manual labeling is very cumbersome. The dataset of whale tail photo has the unbalanced data distribution, and some specific categories in the dataset have very few samples or even one sample. Besides, the samples have small individual differences and contain unknown categories, which leads to the difficulty in automatic labeling of whale identification by image classification. To solve the problem that metric learning is difficult to realize classification under this task, on the basis of Siamese Neural Network (SNN), the training batches were constructed dynamically by using Linear Assignment Problem (LAP) algorithm in the training process of hard-negative sample mining. Firstly, image feature vectors were extracted from the training samples, and the similarity metric of feature vector was calculated. Then, LAP was used to assign sample pairs to the model, training sample batches were constructed dynamically according to the metric score matrix, and the difficult sample pairs were targeted by trained. Experimental results on a whale tail image dataset with unbalanced data distribution and CUB 200-2001 dataset show that, the proposed algorithm can achieve good results in learning minority classes and classifying fine-grained images.
In traditional graph theory based image segmentation methods,the grayscale value of an image is processed directly to obtain clustering results, but the computing time of these methods is very large. A novel segmentation method based on graph partition on histogram clustering was presented. The proposed algorithm obtained threshold by clustering histogram potential function. Since the input is histogram data, the computation time will not be affected by the image size. Experiment results demonstrate that the computation time can be significantly reduced by the proposed algorithm.